Ensemble Learning Approach for Author Profiling
نویسندگان
چکیده
With the evolution of internet, author profiling has become a topic of great interest in the field of forensics, security, marketing, plagiarism detection etc. However the task of identifying the characteristics of the author just based on a text document has its own limitations and challenges. This paper reports on the design, techniques and learning models we adopted for the PAN-2014 Author Profiling challenge. To identify the age and gender of an author from a document we employed ensemble learning approach by training a Random Forest classifier with the training data provided by PAN organizers for English language only. Our work indicate that readability metrics, function words and structural features play a vital role in identifying the age and gender of an author.
منابع مشابه
Language- and Subtask-Dependent Feature Selection and Classifier Parameter Tuning for Author Profiling
We present the CIC’s approach to the Author Profiling (AP) task at PAN 2017. This year task consists of two subtasks: gender and language variety identification in English, Spanish, Portuguese, and Arabic. We use typed and untyped character n-grams, word n-grams, and non-textual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized freq...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملCross-Genre Age and Gender Identification in Social Media
This paper gives a brief description on the methods adopted for the task of author-profiling as part of the competition PAN 2016 [1]. Author profiling is the task of predicting the author’s age and gender from his/her writing. In this paper, we follow a two-level ensemble approach to tackle the cross-genre author profiling task where training documents and testing documents are from different g...
متن کاملSeerNet@INLI-FIRE-2017: Hierarchical Ensemble for Indian Native Language Identification
Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchi...
متن کاملITALICA at PAN 2013: An Ensemble Learning Approach to Author Profiling Notebook for PAN at CLEF 2013
This notebook discusses the approach to the Author Profiling task developed by the Italica group for PAN 2013. This system implements two different sets of classifiers which are combined later in order to build a final classifier that takes into account the decisions of the previous ones. The initial classifiers are focused on vector space representations of the documents as a bag of words and ...
متن کامل